17 research outputs found

    Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and MLPs

    Full text link
    Graph neural networks (GNNs), as the de-facto model class for representation learning on graphs, are built upon the multi-layer perceptrons (MLP) architecture with additional message passing layers to allow features to flow across nodes. While conventional wisdom commonly attributes the success of GNNs to their advanced expressivity, we conjecture that this is not the main cause of GNNs' superiority in node-level prediction tasks. This paper pinpoints the major source of GNNs' performance gain to their intrinsic generalization capability, by introducing an intermediate model class dubbed as P(ropagational)MLP, which is identical to standard MLP in training, but then adopts GNN's architecture in testing. Intriguingly, we observe that PMLPs consistently perform on par with (or even exceed) their GNN counterparts, while being much more efficient in training. This finding sheds new insights into understanding the learning behavior of GNNs, and can be used as an analytic tool for dissecting various GNN-related research problems. As an initial step to analyze the inherent generalizability of GNNs, we show the essential difference between MLP and PMLP at infinite-width limit lies in the NTK feature map in the post-training stage. Moreover, by examining their extrapolation behavior, we find that though many GNNs and their PMLP counterparts cannot extrapolate non-linear functions for extremely out-of-distribution samples, they have greater potential to generalize to testing samples near the training data range as natural advantages of GNN architectures.Comment: Accepted to ICLR 2023. Codes in https://github.com/chr26195/PML

    Advective Diffusion Transformers for Topological Generalization in Graph Learning

    Full text link
    Graph diffusion equations are intimately related to graph neural networks (GNNs) and have recently attracted attention as a principled framework for analyzing GNN dynamics, formalizing their expressive power, and justifying architectural choices. One key open questions in graph learning is the generalization capabilities of GNNs. A major limitation of current approaches hinges on the assumption that the graph topologies in the training and test sets come from the same distribution. In this paper, we make steps towards understanding the generalization of GNNs by exploring how graph diffusion equations extrapolate and generalize in the presence of varying graph topologies. We first show deficiencies in the generalization capability of existing models built upon local diffusion on graphs, stemming from the exponential sensitivity to topology variation. Our subsequent analysis reveals the promise of non-local diffusion, which advocates for feature propagation over fully-connected latent graphs, under the assumption of a specific data-generating condition. In addition to these findings, we propose a novel graph encoder backbone, Advective Diffusion Transformer (ADiT), inspired by advective graph diffusion equations that have a closed-form solution backed up with theoretical guarantees of desired generalization under topological distribution shifts. The new model, functioning as a versatile graph Transformer, demonstrates superior performance across a wide range of graph learning tasks.Comment: 39 page

    DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion

    Full text link
    Real-world data generation often involves complex inter-dependencies among instances, violating the IID-data hypothesis of standard learning paradigms and posing a challenge for uncovering the geometric structures for learning desired instance representations. To this end, we introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states that progressively incorporate other instances' information by their interactions. The diffusion process is constrained by descent criteria w.r.t.~a principled energy function that characterizes the global consistency of instance representations over latent structures. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs, which gives rise to a new class of neural encoders, dubbed as DIFFormer (diffusion-based Transformers), with two instantiations: a simple version with linear complexity for prohibitive instance numbers, and an advanced version for learning complex structures. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks, such as node classification on large graphs, semi-supervised image/text classification, and spatial-temporal dynamics prediction.Comment: Accepted by International Conference on Learning Representations (ICLR 2023

    Localized Contrastive Learning on Graphs

    Full text link
    Contrastive learning methods based on InfoNCE loss are popular in node representation learning tasks on graph-structured data. However, its reliance on data augmentation and its quadratic computational complexity might lead to inconsistency and inefficiency problems. To mitigate these limitations, in this paper, we introduce a simple yet effective contrastive model named Localized Graph Contrastive Learning (Local-GCL in short). Local-GCL consists of two key designs: 1) We fabricate the positive examples for each node directly using its first-order neighbors, which frees our method from the reliance on carefully-designed graph augmentations; 2) To improve the efficiency of contrastive learning on graphs, we devise a kernelized contrastive loss, which could be approximately computed in linear time and space complexity with respect to the graph size. We provide theoretical analysis to justify the effectiveness and rationality of the proposed methods. Experiments on various datasets with different scales and properties demonstrate that in spite of its simplicity, Local-GCL achieves quite competitive performance in self-supervised node representation learning tasks on graphs with various scales and properties

    The mechanisms of Yu Ping Feng San in tracking the cisplatin-resistance by regulating ATP-binding cassette transporter and glutathione S-transferase in lung cancer cells

    Get PDF
    Cisplatin is one of the first line anti-cancer drugs prescribed for treatment of solid tumors; however, the chemotherapeutic drug resistance is still a major obstacle of cisplatin in treating cancers. Yu Ping Feng San (YPFS), a well-known ancient Chinese herbal combination formula consisting of Astragali Radix, Atractylodis Macrocephalae Rhizoma and Saposhnikoviae Radix, is prescribed as a herbal decoction to treat immune disorders in clinic. To understand the fast-onset action of YPFS as an anti-cancer drug to fight against the drug resistance of cisplatin, we provided detailed analyses of intracellular cisplatin accumulation, cell viability, and expressions and activities of ATP-binding cassette transporters and glutathione S-transferases (GSTs) in YPFS-treated lung cancer cell lines. In cultured A549 or its cisplatin-resistance A549/DDP cells, application of YPFS increased accumulation of intracellular cisplatin, resulting in lower cell viability. In parallel, the activities and expressions of ATP-binding cassette transporters and GSTs were down-regulated in the presence of YPFS. The expression of p65 subunit of NF-κB complex was reduced by treating the cultures with YPFS, leading to a high ratio of Bax/Bcl-2, i.e. increasing the rate of cell death. Prim-O-glucosylcimifugin, one of the abundant ingredients in YPFS, modulated the activity of GSTs, and then elevated cisplatin accumulation, resulting in increased cell apoptosis. The present result supports the notion of YPFS in reversing drug resistance of cisplatin in lung cancer cells by elevating of intracellular cisplatin, and the underlying mechanism may be down regulating the activities and expressions of ATP-binding cassette transporters and GSTs

    Handling Distribution Shifts on Graphs: An Invariance Perspective

    Full text link
    There is increasing evidence suggesting neural networks' sensitivity to distribution shifts, so that research on out-of-distribution (OOD) generalization comes into the spotlight. Nonetheless, current endeavors mostly focus on Euclidean data, and its formulation for graph-structured data is not clear and remains under-explored, given two-fold fundamental challenges: 1) the inter-connection among nodes in one graph, which induces non-IID generation of data points even under the same environment, and 2) the structural information in the input graph, which is also informative for prediction. In this paper, we formulate the OOD problem on graphs and develop a new invariant learning approach, Explore-to-Extrapolate Risk Minimization (EERM), that facilitates graph neural networks to leverage invariance principles for prediction. EERM resorts to multiple context explorers (specified as graph structure editers in our case) that are adversarially trained to maximize the variance of risks from multiple virtual environments. Such a design enables the model to extrapolate from a single observed environment which is the common case for node-level prediction. We prove the validity of our method by theoretically showing its guarantee of a valid OOD solution and further demonstrate its power on various real-world datasets for handling distribution shifts from artificial spurious features, cross-domain transfers and dynamic graph evolution.Comment: ICLR2022, 30 page

    Multisensory information facilitates the categorization of untrained stimuli

    No full text
    Although it has been demonstrated that&nbsp;multisensory&nbsp;information&nbsp;can&nbsp;facilitate&nbsp;object recognition and object memory, it remains unclear whether such&nbsp;facilitation&nbsp;effect exists in category learning. To address this issue, comparable car images and sounds were first selected by a discrimination task in Experiment 1. Then, those selected images and sounds were utilized in a prototype category learning task in Experiments 2 and 3, in which participants were trained with auditory, visual, and audiovisual&nbsp;stimuli, and were tested with trained or&nbsp;untrained&nbsp;stimuli&nbsp;within&nbsp;the&nbsp;same categories presented alone or accompanied with a congruent or incongruent stimulus in&nbsp;the&nbsp;other modality. In Experiment 2, when low-distortion&nbsp;stimuli&nbsp;(more similar to&nbsp;the&nbsp;prototypes) were trained, there was higher accuracy for audiovisual trials than visual trials, but no significant difference between audiovisual and auditory trials. During testing, accuracy was significantly higher for congruent trials than unisensory or incongruent trials, and&nbsp;the&nbsp;congruency effect was larger for&nbsp;untrained&nbsp;high-distortion&nbsp;stimuli&nbsp;than trained low-distortion&nbsp;stimuli. In Experiment 3, when high-distortion&nbsp;stimuli&nbsp;(less similar to&nbsp;the&nbsp;prototypes) were trained, there was higher accuracy for audiovisual trials than visual or auditory trials, and&nbsp;the&nbsp;congruency effect was larger for trained high-distortion&nbsp;stimuli&nbsp;than&nbsp;untrained&nbsp;low-distortion&nbsp;stimuli&nbsp;during testing. These findings demonstrated that higher degree&nbsp;of&nbsp;stimuli&nbsp;distortion resulted in more robust&nbsp;multisensory&nbsp;effect, and&nbsp;the&nbsp;categorization&nbsp;of&nbsp;not only trained but also&nbsp;untrained&nbsp;stimuli&nbsp;in one modality could be influenced by an accompanying stimulus in&nbsp;the&nbsp;other modality.</p

    Trading Hard Negatives and True Negatives: A Debiased Contrastive Collaborative Filtering Approach

    Full text link
    Collaborative filtering (CF), as a standard method for recommendation with implicit feedback, tackles a semi-supervised learning problem where most interaction data are unobserved. Such a nature makes existing approaches highly rely on mining negatives for providing correct training signals. However, mining proper negatives is not a free lunch, encountering with a tricky trade-off between mining informative hard negatives and avoiding false ones. We devise a new approach named as Hardness-Aware Debiased Contrastive Collaborative Filtering (HDCCF) to resolve the dilemma. It could sufficiently explore hard negatives from two-fold aspects: 1) adaptively sharpening the gradients of harder instances through a set-wise objective, and 2) implicitly leveraging item/user frequency information with a new sampling strategy. To circumvent false negatives, we develop a principled approach to improve the reliability of negative instances and prove that the objective is an unbiased estimation of sampling from the true negative distribution. Extensive experiments demonstrate the superiority of the proposed model over existing CF models and hard negative mining methods.Comment: in IJCAI 202
    corecore